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7.0 INTRODUCTION 


The study of mathematics is more than an interesting intellectual exercise. 
It also has many applications in physical science, engineering and techno- 
logy, social science, medicine and other fields. Such applications frequently 
lead to mathematical problems whose solutions are not as easy to calculate 
as the specially constructed ones you have met so far in this course. For 
example, it is easy enough to find the inverse of a 3 x 3 matrix using 
Hermite normal form—but how would you find the inverse of a 30 x 30 
matrix? The answer “ use a computer” doesn't fully deal with the question. 
What do you tell the computer to do, in order to be sure (a) that the answer 
the computer gives you is accurate enough and (b) that you are not wasting 
valuable computer time by using an unnecessarily slow method? The 
search for answers to such questions has led to the development of a branch 
of mathematics called numerical analysis. In the Foundation Course 
(M100), we discussed some of the elementary ideas of numerical analysis. 
In particular, Unit M100 2, Errors and Accuracy and Unit M 100 28, Linear 
Algebra IV were almost entirely devoted to it. There were also parts of 
the other units, like Unit M100 4, Finite Differences, Unit M100 9, Inte- 
gration I, Unit M100 14, Sequences and Limits II and Unit M100 24, Dif- 
ferential Equations I, which included a considerable amount of numerical 
work. Explicit reference will be made to the Foundation Course units 
where necessary. 


In the numerical analysis units of this course we will discuss accurate and 
economical methods for the calculations in linear mathematics, such as 
the calculation of matrix inverses, the solution of systems of linear equa- 
tions, the calculation of eigenvalues and eigenvectors, and the solution 
of differential equations. 


One feature that all these methods have in common is that they depend on 
arithmetical processes which are repeated many times during the calcula- 
tion, and which usually introduce small round-off errors at each step. There 
is a tendency for such errors to accumulate, and sometimes even to be 
magnified as the calculation proceeds, and in consequence it is quite 
possible for a badly designed calculation to give wildly wrong answers. 
For example, you will see later in this unit how a plausible method for the 
= 


apparently simple problem of evaluating the integrat Í x*e*^! dx gives the 


o 
answer —0.7280 whereas the correct answer is about +0.1; the calcula- 
tion gives an answer with the wrong sign, and about 7 times too big. This 
disastrous result is caused by the magnification of a small round-off error, 
less than 0.00005 in magnitude, in the initial data used to start the calcu- 
lation. 


The main purpose of this unit is to show you how such unpleasant possi- 
bilities come about and how they can be controlled and avoided. We shall 
discuss them in the' context of the simplest calculation involving a repeti- 
tive process—the evaluation of a sequence of numbers from a recurrence 
formula (or recurrence relation as it is often called), such as the formula 
F, = F,., + F,-2 involved in the Fibonacci sequence, which we mentioned 
in Unit 5, Determinants and Eigenvalues. 


In sub-section 7.2.3 we explain why the problem of solving a recurrence 
relation is a linear problem, by considering the vector space whose elements 
are all sequences of real numbers. 


7.1 INSTABILITIES 
7.1.0 Introduction 


By the term instability we refer to any situation where a small error in a 
number at one stage of the calculation is magnified to give a large error in 
the answer. There are two main types of instability; one arising from the 
nature of the problem for which we are trying to compute a solution, the 
other from the method we use to solve the problem. We consider them in 
turn. 


7.1.1 Ill-conditioned Problems: Inherent Instability 


So far in this course we have taken it for granted that, whatever calculation 
we are doing, we start with well-defined data and that we obtain a well- 
defined correct result, even though this may be difficult to calculate 
exactly. Not all calculations, however, are of this type. For example, in 
many practical problems some of the data have been obtained by the use 
of measuring apparatus, such as rulers or voltmeters. We saw in Unit M100 
2, Errors and Accuracy how to represent this situation mathematically: 
if the result of some measurement is a number x and the accuracy of that 
measurement is described by the absolute error bound e, then all we can 
say about the true value of the measured quantity is that it lies in the 
interval [x — e, x + c]. Now, the quantity we want to calculate depends on 
the true value of x, which we may call X; but since we do not know X 
exactly we cannot find the exact answer. Suppose, for example, that the 
object of our calculation is to evaluate the image of X under some func- 
tion f. Then the best we can do is to calculate the image of the interval 
[x — e, x+e], and assert that f(X) lies somewhere in this image set. 
Depending on the nature of the function f and the value of e, this assertion 
may or may not provide accurate information about the value of f(X). 


Example 1 


Suppose a measurement of a physical quantity X gives the value x É20 
with an accuracy of 2 significant figures. What can be said about the value 
of X19? 


Since we know only that X e [1.95, 2.05], all we can deduce is that 
X? © [(1.95)!5, (2.05)!°] = [794, 1311] 
Thus, although we can compute x!? as 21? — 1024, we cannot even be sure 
that the first digit of this answer is correct. 
Example 2 


Suppose our measurement is as before, but this time we are interested in 
X1. This time we deduce X!/!9 e [(1.95)!/10. (2.05)!/1^] ~ (1.069, 1.075] 
and so the value of x!/!?, which is 1.072 to 3 decimal places, is in eter 
by at most 0.3 %, and its first 3 digits are certainly correct. Thus we should 
be justified in quoting the answer as X?/!? = 1.07; but to quote any more 
figures would be unnecessary and possibly misleading. 


Example 3 
In the quadratic equation 
t? — 2.0291 + 1.028 =0, 


suppose that the number 2.029 is known exactly, but that all we know 
about the number 1.028 is that it is correctly rounded, i.e. that it lies in 
the interval [1.0275, 1.0285]. Then all we can say about the larger of the 
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two roots is that it lies between the larger roots of t? — 2.0291 + 1.0275 = 0 
and t? — 2.0291 + 1.0285 = 0, i.e. between 1.0412 and 1.0559. Uncertainties 
in the fifth significant figure of the data produce uncertainties in the third 
significant figure of the answer. 


Three important points emerge from these examples. 


(i) Any “rule of thumb” that the accuracy in the answer is roughly the 
same as that of the data, is quite unfounded. 

(i) In Examples 1 and 3 the errors in the answers were much larger 
than those in the data, whereas in Example 2 we had the opposite 
effect. We say that problems like Examples 1 and 3, in which any 
errors in the data are magnified, are ill-conditioned, whereas problems 
in which the errors are diminished are well-conditioned. 

(iii) This effect depends only on the problem we are attempting and not 
the method used to calculate its solution. For this reason, ill-condi- 
tioning is sometimes referred to as inherent stability, to distinguish 
it from instabilities that are not inherent in the problem itself. 


Sometimes we can give a numerical measure of the degree of conditioning. 
For the problem of computing images under a real function, a natural 
choice is the magnification factor by which an error e in the domain must 
be multiplied to give the corresponding error in the codomain. If fis differ- 
entiable and e is sufficiently small, then 


f(x e) e fex) + of’) 
(see Unit M100 14, Sequences and Limits II). 


ffo) 


tan a = f'(x) 


Aj UB 
x x*e x 


BD>AE+FC 
f(x e) = f(x) + ef'(x) 


Thus, the error in the codomain is very close to ef (x), that is, the magnifi- 
cation factor is very close to the derivative f'(x); but, since we are not 
interested in the sign of the error, we shall take the magnitude |f'(x)| as 
our measure of conditioning. 


Since it relates absolute rather than relative errors*, we may call |f'(x)] 
the absolute condition number for the problem of calculating f(X). If the 
absolute condition number is much larger than 1, the problem is said to be 
(absolutely) ill-conditioned; if much smaller, then it is (absolutely) well- 
conditioned. 


* For absolute and relative errors see Unit M100 2, Errors and Accuracy. 


An alternative measure of conditioning is the relative condition number, 
defined by using thé ratio of the relative rather than the absolute errors 
in codomain and domain. The relative error in the domain is defined as 
e[x where e is the error in x, and that in the codomain is approximately 
ef (XM f(x) (for small e); so the relative condition number is 


FOA | _ | x69) 
elx F(x) 


If this number is larger than 1 the problem is said to be relatively ill- 
conditioned; if smaller, it is relatively well-conditioned. 


Although we have introduced the idea of ill-conditioning here in the con- 
text of a physical problem whose initial data come from an (inaccurate) 
physical measurement, it is also useful in purely mathematical problems, 
where the problem uses only mathematical data, such as the values of 
numbers like x or e or 4/2, or even 4. If we look up such numbers in a 
book of tables, or rely on standard computer subroutines to calculate 
them, the values we obtain will contain small rounding errors which may 
be magnified to unacceptable proportions if the problem is ill-conditioned. 
In this mathematical case, however, we are better off than in the physical 
problem, because there is no physical limitation on the accuracy of the 
data—by taking sufficient trouble we can make these mathematical data 
as accurate as we like. Thus in the case of a physical problem, ill-condi- 
tioning imposes absolute restrictions on the accuracy of our result, but in a 
mathematical problem it is merely a warning that special care will be 
needed to get high accuracy. 


Example 4 


Suppose that we wish to compute y = e!? for e = 2.71828 ... , and that 
we wish to have the first three figures correct in the computed result. 
Taking successive correctly rounded approximations to e, we obtain the 
rounded values 


(3)? — 0.5905 x 105 
(2.7)? = 0.2059 x 105 
(2.72)? = 0.2217 x 10° 
(2.718)? = 0.2200 x 105 
(2.7183)'° = 0.2203 x 105 


and this last answer is correct to three significant figures. The problem is 
ill-conditioned, since we must start with at least 5 correct figures in the 
" data" to be sure of three correct figures in the answer; but because it’s 
a mathematical rather than a physical problem, we are able to get the 
accuracy we require by increasing the accuracy of the “data ” beyond the 
3 figures we might have expected before starting the calculation. 


Exercises 


l. Calculate the absolute and relative condition numbers for Examples 1 
and 2. Check that they correctly predict ill-conditioning where it 
occurs. 


2. Ify is defined, for any given x (less than (1.0145)?), as the larger solu- 
tion of the quadratic equation 


y! —2.029y +x =0, 


show that the absolute condition number for calculating y from a 
given value of x is (2y — 2.029)7!. Hence show that when y — 1.028 
the problem is ill-conditioned. 

3. Is the mathematical problem of computing e°, with e = 2.71828 
given, ill- or well-conditioned ? UU 


Solutions 


T. 


In Example 1 the function is x ——— x!° and the value of x is 
2. The absolute condition number is 10 x 2? — 5120 and the 
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relative condition number is 2 x oe =10. By either 


criterion the problem is ill-conditioned. 


In Example 2 the function is x + x!/!? and the value of 
x is again 2. The absolute condition number is 
1/10 
1 aano- _ 277 1072 


= = 0.05, 
10 20 20 9, 


and the relative condition number is zs. By either criterion the 
problem is well-conditioned. 
The quickest way of getting f(x), where y — f(x), is as follows. 
The quadratic can be written 


IO)? — 2.029f(x) + x = 0, x < (1.0145)? 
Differentiating, we get 
2fGof'(x) — 2.029f'() - 120 
and rearranging, this gives 
1 
2.029 — 2f(x) 


Since the solutions are 1.0145 +. /somethin, , the larger solu- 
tion is greater than 1.0145; so 2f(x) exceeds 2.029 and so the 
absolute condition number is 
1 2 1 
2f(x) — 2.029 2y — 2.029 
When y = 1.028 the condition number is 
1 "E 
2.056 — 2.029 0.027 — 


so that the problem is ill-conditioned. 
The absolute condition number is f“(e), where: x ———5» x!/10; 
that is 


fe 


I'o] = 


37, 


f= 5 e7910 ~ 0.04 


and so the problem is well-conditioned. 


7.1.2 Induced Instability: Good and Bad Numerical 
Methods 


We now turn to the second main source of instability. We have already 
mentioned, and we shall demonstrate more clearly in the next section, 
that computer arithmetic is not exact. Almost every arithmetical operation 
produces an error, and the combined effect of these errors needs careful 
analysis. In some methods the build-up of arithmetic errors is so great that 
the computed result may be a very poor approximation to the true solu- 
tion even in a well-conditioned mathematical problem, and in a physical 
problem it may lead to results well outside the true solution interval. 
This effect is called induced instability; do not confuse it with ill-condi- 
tioning: ill-conditioning is a feature of the problem, whereas induced 
instability is a feature of the method used to solve the problem. 


It is perhaps a surprizing fact that many of the most obvious numerical 
methods can exhibit induced instability. We shall see many examples as 
the course proceeds. 


Example 1 


We wish to compute, to four significant figures*, both roots of the quadratic 
equation 


x? — 2x + 0.009 = 0 


in which all the coefficients are exact. How do we do this using a computer 
which stores only four decimal digits? 


The usual formula for the roots of a quadratic equation gives 


x, =1+,/0.991 
= 1 + 0.9955 (in our machine) 
= 1,996, correct to four significant figures. 
x2 =1 - Vl 0.991 
= ] — 0.9955 
= 0.0045, correct to only two significant figures. 


Thus the method has lost two significant figures; it exhibits mild induced 
instability. How can we get the required 4-figure accuracy in x2? The 
reason for the loss of accuracy in the above method is that x, was ob- 
tained by subtracting two nearly equal quantities. To get 4-figure accuracy 
we use a different method. A good method is to use the formula for the 
product of the solutions of a quadratic equation. The product of the 
solutions of x? + bx + c — 0 is c, and so, knowing x,, we have 


xz = 0.009/x, = 0.009/1.996 = 0.004 509. 


This is almost correct to 4 significant figures (i.e. the absolute error bound 


of the 4-figure result given is not much greater than 1 in the last decimal 
place). 


Exercises 


l. Justify the statement made above, that the result X3 = 0.004 509 is 
almost correct to 4 significant figures, by estimating a relative error 
bound (i.e. a bound on the relative error in X3). 


(Remember from Unit M100 2, Errors and Accuracy, that a relative 


error bound for a quotient is the sum of the relative error bounds for 
the numerator and denominator.) 


* If necessary, see definition of significant figures in Unit M100 2, Errors and Accuracy, 


2. Given that. /40 — 6.32 and Jal = 6.40, correct to 3 significant figures 
in each case, calculate Jn — 40 


(a) directly 
(b) by using the fact that 


(VA — /40) (VA 4/40) - 1 


(Use a table of reciprocals.) 


Estimate the relative error in both answers. 


Which method is preferable? 


Solutions 


l. 


In the quotient 0.009/1.996, the numerator is exact and the 
denominator 1.996 has absolute error bound 0.0005 and rela- 
tive error bound 0.0005/1.996 — 0.000 25 approximately. The 
relative error bound for the quotient 0.004 509 is therefore 
0.000 25 too, and so the absolute error bound is 0.004 509 
x 0.000 25. This is less than 0.12 x 1075, i.e. not much more 
than 1 unit in the last significant figure of the computed x;. 


(à) /41 —./40 = 6.40 — 6.32 = 0.08 
1 1 
© t= ere a Uem 
ES 
1272 


In (a) the absolute error bounds for 6.40 and 6.32 are 0.005, so 
that an absolute error bound for 0.08 is 0.01, corresponding 
to a relative error bound of $. In (b) the absolute error bound 


= 0.078 62 


‘for 12.72 is 0.01 for the same reason as in (a), but this corre- 


sponds to a relative error bound of 0.01/12.72 which is less 
than 107%, and so the relative error bound for 0.078 62 is also 
less than 107?. Thus method (b) is more than 100 times as 
accurate as method (a). Once again, subtracting nearly equal 
numbers leads to a poor result. 


1 


7.1.3 Floating-point Number Storage 


In order to devise methods of computation that will avoid induced insta- 
bility and, more generally, to discover how accurate the results of our 
computation are, we want to be able to analyse the propagation of errors 
in a computation. These errors arise from the fact that the computer 
cannot in general carry out arithmetical operations exactly, and this in 
turn arises from the fact that the computer cannot store numbers exactly. 
To see how this limitation gives rise to inexact arithmetical operations, let 
us consider a (hypothetical) four-figure decimal machine working in 
floating-point arithmetic. This means that numbers are stored in the form 
10° x a, where b is a positive or negative integer, and |a| is a four-figure 
number in the interval (0.1, 1) 2 ([a]: 0.1 < |a| <1}*. For example, 
5764 is stored as 10* x 0.5764, 0.005 764 as 107? x 0.5764, 5.764 as 
10! x 0.5764, and so on. We shall use the symbol fl(x) to mean our floating- 
point form of the number x; so that, for example, fi(5764) = 10* x 0.5764. 


The first step, of course, is to get the data of the problem into our machine. 
When any number is fed in it is automatically rounded to four significant 
digits and then expressed in floating-point form. For example, the number 
Jis stored as 10? x 0.3333, the number ;!- = 0.090 909... as 107! x 0.9091, 
the number z = 3.141 59 ... as 10! x 0.3142, and so on. 


Notice that a whole range of numbers will have the same stored value. 
For example, every number x in the interval [0.576 55, 0.576 65) is stored 
as fi(x) = 10° x 0.57661. The maximum error in the decimal part a is 
0.00005 = 5 x 1075, so that the maximum absolute error in the stored 
number 10° x a is 5 x 10*75, More commonly, we investigate the maxi- 
mum relative error, and since the stored number is at least 0.1 x 10°, it 
follows that the relative error is at most (5 x 10*75)/(0.1 x 10°), that is 
5 x 107^. Thus, the introduction of a storage error in an item of data x 
can be described by the formula d 


A(x) = x(1 + ra). 


The quantity r, defined by this formula is called the relative error. It 
depends on x (and may be zero) and has upper bound given by 


[r| <5 x 1074. 
Exercises 


1. Evaluate r, in the formula fi(x) = x(1 + rą), with (a) x = 4, (b) x 24r, 


(c) x =1, for our fictitious four-figure decimal machine, and check 
that each |r,| is less than the upper bound given above. 
2. Ina binary machine, the number x is stored in the floating-point form 


fl) 22^ xa, where ix|a|«1, 


and |a| is stored as a binary number; for example, 0.110 010 1 ... 01. 
If the machine can store t binary digits for |a|, show that 


fig) 2x0 rj), where |r,| «27*. 
Solutions 
1. (a) 10° x 0.3333 = 4(1 + r,) gives 
r = 0.9999 — 1 = — 0.0001 
(b) 107 x 0.9091 = 4, (1 + r,) gives r, = 0.00001 


* In Unit M100 8, Computing I we used a slightly different convention for floati. i 
representation; there [a| was a six-figure number in the interval U1, 10]. eB Point 
T See Unit M100 2, Errors and Accuracy, pp. 7-8. 
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(c) 10! x 0.3142 = a(1 + r,) gives 


3.142 0.000 41 


"= 374759 7344159 ^ 0000 13 


In each case |r] <5 x 1074. 

2. The absolute error bound for |a| is 4 x 2~' or 27'^!. The 
absolute error bound for 2° x a is therefore 2°-'~!. The rela- 
tive error bound is 


ge-t-1 | 3-11 


a «27 
2 xa d 


since a > 2^ !. Thus we have shown that 


Irs] «277. 


7.1.4 Floating-point Arithmetic 


The next point to examine is how the machine performs arithmetic on its 
numbers, already stored (in our machine) in four-figure floating-point 
form. The details vary in different machines, but here we describe the 
most common methods. 


(a) Addition and Subtraction 


If x, = 10" x a,, x; =10" x a,, with b, 2 bz, then the machine first 
evaluates ' 


I0^(a, + 102 7^ x aj) 


to eight digits and rounds it to four significant digits (in our machine), 
and then adjusts the exponent (if necessary) to produce the result in 
standard floating-point form. For example, 
(i) f{10? x 0.5765 + 10° x 0.2946) = f1{107(0.5765 + 0.002946) 
= f1(102(0.579 446)} 
= 10? x 0.5794 


(ii) f(10? x 0.5765 + 10? x 0.4826} = fi{102(0.5765 + 0.4826)} 
= fi{107(1.0591)} 
= 10? x 0.1059 


(iii) f1{10? x 0.1024 — 10' x 0.9048} = fi(10?(0.1024 — 0.09048)) 
= fi{107(0.011 92)} 
= 10! x 0.1192 


(b) Multiplication 


With x, and x; in floating-point form as above, the machine forms 
10^: * (a, x a5), then rounds the number in brackets to four significant 
digits (in our machine), and finally adjusts the exponent (if necessary) to 
give the result in standard floating-point form. For example, 


(i) f((10? x 0.5765) x (10° x 0.5765) 


= fi(10? x 0.33235225) 
= 10? x 0.3324 


(i) (10? x 0.5765) x (107? x 0.1112) 
= fi(10(0.064 049 15)} 
= 10? x 0.6405 
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(c) Division 

The quotient x,/x, is calculated as follows. If |a, | < |a,|, the machine cal- 
culates a,/a; to eight-figure accuracy, and since the first digit cannot be 
zero the floating-point result is 


10% 7 x the rounded value of 
2 
If |a | > azl, we divide 31; a, by a2, with suitable adjustment of the expo- 
nent b; , and the floating-point result again follows easily. For example, 
G) f{(10? x 0.5765) + (10? x 0.6294)} 
= fi(10! x 0.915951 70} 
= 10! x 0.9160 


(i) f((10? x 0.5765) + (107? x 0.4968)} 
= fi{(10* x 0.057 65) + (107? x 0.4968)} 
= fi((105 x 0.116042 67)} 
= 105 x 0.1160 


As in the case of the floating-point storage of data, so here we can obtain 
an upper bound for the relative error in an arithmetic operation. 


Defining this relative error r by 
fl 0x2) = GG exa) +r), 


where ois +, —, x or +, r can be shown, once again, to have the upper 
bound 5 x 1074. 


Computers use binary rather than decimal arithmetic. Numbers are stored 
in the form 2° x a, where |a| is a binary number in the interval [4, 1). 
Our basic error formulas then become fi(x) = x(1 + ry), |r.| € 2^' (see 
Exercise 2 of sub-section 7.1.3.), fi(x; 0x2) =(x,°x2)(1 +r), |r] <275 
where 1, the so-called ‘“‘word-length” of the machine, is the number of 
binary places reserved for the storage of the fractional part of the number. 


Various results of floating-point arithmetic conflict with our normal ex- 
pectations and mathematical ideas. For example, if x, and x, are positive 
with x, > x2, we expect that x, — x; is also positive but smaller than x, . 
In floating-point arithmetic this may not be true. For example, if we use 
our hypothetical decimal machine to compute x, — xz, with x, = 14.19501, 
X3 = 0.00497, the machine first stores 


fl(x;) = 107? x 0.4970, fi(x,) = 10? x 0.1420, 
and then performs the floating point subtraction to obtain 
fi{(10? x 0.1420) — (107? x 0.4970)} 
= fi(107(0.1420 — 0.000049 70) 
= f1{107(0.141 950 30)} 
= 10? x 0.1420 = 14.20, 


which is larger than the original x,! 


Moreover, with exact arithmetic there are useful properties like associa- 
tivity, which ensures that the order of some numerical operations is 
immaterial: 


ad (b--c) 2 (a- b) +e. 

a(bc) — (ab)c. 
With a succession of floating-point operations, we find that different 
orderings produce (in general) different results. Although this is not a 


serious blemish on computer arithmetic, we may, in extreme cases, wish 
to use the optimum ordering. 
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Exercise 


Evaluate 3827 + 12.54 + 1.567 using four-decimal floating-point arith- 
metic 


(i) starting at the left 
(i) starting at the right. 


The exact value of the sum is 3841.107. Which result is the more accurate? 
Can you say (without detailed error analysis) why it is best to start the 
summation with the smallest pair of numbers? 


Solution 
Method (i) First, 3827 + 12.54 is evaluated as 


fi(10* x 0.3827 + 10? x 0.1254} 
= fi{10*(0.3827 + 0.001 254)} 
= fi{10* x 0.383 954} 
= 10* x 0.3840. 


Adding in the last term gives 


fi{10*(0.3840 + 0.000 156 7)} = fi(10* x 0.384 156 7) 
= 10* x 0.3842. 


Method (ii) First, 1.567 + 12.54 is evaluated as 


f1(107(0.015 67 + 0.1254)) = fi(10? x 0.141 07} 
= 10? x 0.1411. 


Adding in the last term gives 


f1(10*(0.001 411 + 0.3827)} = fi(10* x 0.384 111} 
= 10* x 0.3841. 


If e, and e; are bounds on the absolute errors introduced by the 
two additions, an absolute error bound for the result is e, + €z. 
In the text it is stated that an error bound for a floating-point 
addition operation is 5 x 1074 times the result. To minimize the 
absolute error introduced by the first addition, we should make 
its result as small as possible, i.e. perform 12.54 + 1.567 not 3827 
+ something. (The error in the second addition cannot be altered 
much, since the sum is bound to be about 3841.) 
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7.4.5 An Elementary Case Study 


To illustrate the various points made so far in this unit, let us consider 
the problem of evaluating the integral 


1 
f xe") dy, 
o 


Suppose, as is the habit of mathematicians, that we generalize this a bit 
and consider the integral 


1 
=e! [ xe dx,* 
0 
and enquire how we might compute 7, for r — 0, 1, 2, ... . There are at 
least four possible methods. 


(i) Finite series Using the Techniques of Integration handbook, TI, we 
find that J, can be expressed as the finite series 


L-l-rcr(r-1)- + (- 1)! 
+(-1yrid — e74, 


(see Exercise 1, at the end of this sub-section). 


This is in a sense a reformulation of the problem. Is it satisfactory for 
numerical purposes? We can certainly evaluate the series for any 
value of r, but how accurate will be the result computed by our 
hypothetical four-digit decimal machine? 


Trying r = 6, we get 
Ig =1—6 + 30 — 120 + 360 — 720 + 720(1 — e~’). 


Now e^! is 0.367879 ..., but our machine can store only the 
rounded version 0.3679, with an error of 0.000 020 ... . The error 
in the computed 7, is then 720 times this error, about 0.014, and 
since J, turns out to be about 0.127 we have a relative error of more 
than 10%. For I5, for which the factor multiplying (1 — e^!) is 
362 880, the error from this source is as much as 7.2, many times 
greater than the true value of J,, which is about 0.09. 


For the computation of J, there is another source of error. The 
series is 


1 — 9 + 72 — 504 + 3024 — 15 120 + 60 480 — 181 440 
+ 362 880 — 362 880(1 — e^), 


and the last three terms cannot be stored accurately in floating-point 
form. 


Using the word “formulation ” in a slightly more general sense than 
previously, it is clear that we have formulated an ill-conditioned 
mathematical problem (since small errors in the “data” e^! are 
multiplied by large numbers to produce large errors in the answer), 
and that the method of solution leads to induced instability (since 
rounding errors in the calculation of such numbers as 362 880 are 
large and can seriously affect the computed result). 


(ii) Numerical integration A possibility for avoiding the difficulties we 
encountered above in evaluating J, is to compute it, for each r, by 
numerical integration, using, for example, Simpson's rule (see Unit 
M100 9, Integration I). The formula is 


* The evaluation of this integral is the subject of a Computer Exercise contained in the sup- 
plementary booklet for this unit. 
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(iii) 


I; SO) dx =th (fo -- fit 2f, ^ 4f, 


* 2f oo Ban-2 + Aaa fu) 
where 2nh = 1, and f, = f(kh) (k —0, 1, ..., 2n). 


In our particular case this becomes 
I. Xe dx = 3 h(0 + A()e^ + 2(2hy'e?" +--+ + e) 


and in order to obtain 7,, we have to multiply this by e^!. This is a 
better idea, the reformulated problem being well-conditioned (any 

"error" in e^? clearly produces only the same relative error in the 
result), and the method has little induced instability. In fact, the 
integrand is positive everywhere and the addition of positive numbers 
cannot, through rounding errors, produce a large relative error; it 
was the cancellation of large numbers in method (i), producing a 
small result with an absolute error appropriate to a large result, 
which caused the induced instability. 


On the other hand, we now have a lot of computation, including 
many evaluations of e*. Moreover, Simpson’s rule is not exact for 
this integrand, and we either have to estimate the error by some non- 
trivial analysis or, more practically, use a sequence of decreasing 
intervals / in the Simpson formula. As / approaches 0 the Simpson 
approximation approaches the true value of the integral, and we can 
estimate the accuracy by inspection. For example, with h = 4, 3 and 
4 the four-figure Simpson-rule estimates of J, are respectively 0.1730, 
0.1312 and 0.1271, and A = 4l; gives 0.1268, and we have some con- 
fidence that this is correct to four figures. (The rate of convergence 
to the true result can be improved by a device called Romberg inte- 
gration, which you will find described in Computing Methods for 
Scientists and Engineers, by Fox and Mayers (see Bibliography). 


A recurrence formula In methods (j) and (ii) the computation of 
I, was performed independently for each value of r; but when we 
want to calculate a number which can be regarded as a member of a 
sequence, in this case the sequence Jy, 7, , J2, ... it is often useful to 
look for a formula connecting each element to one or more of its 
immediate predecessors in the sequence. In Unit M100 7, Sequences 
and Limits I, we called these formulas recurrence formulas; but they 
are more commonly called recurrence relations. For the sequence 
Io, Ij ,..., a recurrence relation can be found using integration by 
parts (see Exercise 1 of this sub-section), viz: 


L=1 =r, ((21,2,3,..) 


If we know one member of the sequence, say Jy, we can compute 
L, Dj, ... in succession, from this recurrence relation. So here is 
another reformulation of the problem of calculating /,, apparently 
very attractive and arithmetically simple. Let us try it out by per- 
forming the computation on a four-digit machine. With Jy = 1 — e^! 
— 0.6321 in our machine, we would find the results 


r 0 1 2 3 4 


I, | 0.6321 0.3679 | 0.2642 | 0.2074 | 0.1704 


r 5 6 7 8 


I, 0.1480 0.1120 | 0.2160 | —0.7280 


f 


LM 7.1.5 


(iv) 


Something is obviously wrong, since clearly 7, should always be 
positive. In fact, the poor results (in particular, for 75) are due solely 
to a large magnification of the original error in Jọ. Again we have 
an ill-conditioned formulation! In this particular calculation there is 
no induced instability because all the arithmetic happens to be exact. 
But if there had been any rounding errors in the successive applica- 
tions of the recurrence relation these would also have been magnified, 
and the method would have shown induced instability as well as ill- 
conditioning. 


Backward recurrence The recurrence relation is such an attractive 
idea that we do not abandon it immediately, but look for some other 
information which, while theoretically equivalent to what we have 
already used, might lead to a well-conditioned formulation of the 
problem. We don't want to compute directly any other value of 7,, 
but what we can do is consider what happens when r is large. 


Ax3e* x9e* 


| 


18 


1.0 x [7] 10 x 


For large r, the factor x’ in the integrand, x'e*, is very small (except 
in a small region where x is very close to 1), and so we expect that J, 
will be very small. (In fact the integrand lies between 0 and ex’, and 
so el, lies between 0 and 


1 
ex! dx =—, 
0 rl 


which approaches 0 for large r.) 


This suggests the possibility of doing thc recurrence in the reverse 
direction, using the recurrence relation in the form 


1 
L--7l0-1) 


and starting with the approximation of taking J, = 0 for some large n. 


(0) 


1.0 


Trying it out for n = 10, 12 and 14, we would obtain, using our 
hypothetical four-figure machine 


| r | L(n-10) | Z(212) | L(n—14) 
14 0 
13 0.07143 
12 0 0.07143 
il 0.08333 0.07738 
10 0 0.08334 0.08387 
9 0.1000 0.09167 0.09161 
8 0.1000 0.1009 0.1009 
7 0.1125 0.1124 0.1124 
6 0.1268 0.1268 0.1268 
5 0.1455 0.1455 0.1455 
4 0.1709 0.1709 0.1709 
3 0.2073 0.2073 0.2073 
2 0.2642 0.2642 0.2642 
1 0.3679 0.3679 0.3679 
0 0.6321 0.6321 0.6321 


The results indicate that at last we have found a well-conditioned 
mathematical formulation of the problem, at least for all r with r < 8. 
This time our * data " is the value of n, and we see that as far as the 
calculation of J, is concerned it does not matter whether we take 
n = 12 orn = 14 or (presumably) some even better ** approximation ” 
to the “correct value” of n (i.e. some even larger value)—we always 
get Ig = 0.1009 to 4 figures. Moreover, the method has no induced 
instability; although rounding errors do occur at every step when 
we divide by r, they do not affect the calculated value of J, for r < 8. 


This case study shows, in the first place, how important it is when 
solving a problem to investigate methods closely, to correct when 
possible both inherent and induced instability. It also illustrates how 
recurrence relations, if handled correctly, can give very convenient 
and powerful methods in numerical computation; but that, if handled 
incorrectly, they can give nonsense. In order to be able to use them 
effectively, therefore, you need to know something about the theory 
of recurrence relations and how to analyse the errors in numerical 
calculations based on them. In the remainder of this unit we will 
discuss each of these points further. 


Exercises 


1. 


Use the method of integration by parts to obtain the formula 
L-21-r4r(r-1) (717r (7 yr - e, 


1 
where J, = e^! f x'e* dx. 
o 
Consider the problem of evaluating e~'°, which is about 0.000 05. 
The obvious way is to substitute — 10 for x in the Maclaurin series 
2 us 


fathers ke 


obtaining 
100 1000 10000 100000 


-10 
ees PQ c EE 120 
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The series is convergent, but it is not a good method of calculating e^, 


10 


because of induced instability. 


() 
(ii) 
(iii) 


What feature of the series should put you on the look-out for induced 
instability ? 

What accuracy would you expect in your value for e~'° if you used 
this method to calculate e^ !? on the hypothetical four-digit computer ? 
Suggest a better method for calculating e~'°, giving an accuracy of 
nearly 4 significant figures. (Hint Look at the method suggested in 
Example 1 of sub-section 7.1.2, for avoiding induced instability in 
the solution of a quadratic equation.) 


Solutions 


1 
LX =e Í xe dx 
0 


1 
d 
zen | x 
e [ios 


o 
=e! [x e]} — e7 [ore dx, 
by integration by parts. 
Thus, 
IL21-rl., (r-1, 2, 3...) 
from which 
f-1 =1-(r-Df,-2 
and hence 
IL-21-r(l—-(r—-1).,) 
-1-r-ctr(r-14., 
This process can be repeated, giving 
L-l-rer(r—-l1)-r(r—1(r—24.., 
and 
IL2l-r-cr(r—-1)—r(r—1)(r—2) 
*r(r-1(r—2(r—3)1,.., 
The final stage yields 
Lel-rer(r—l)- 


*GCcCD'rn-(-(-104g 
=1 A rr(r—lY)--(-10rm7!H 
+(- Iri — e7) 


> 


since 


h=1-e!, 
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(i) The calculation involves the subtraction of large num- 
bers to produce a result which is relatively small (between 
O and 1). 

(ii) No accuracy at all can be expected; that is to say, the 
number obtained might bear no relation to the true value 
of e^!9, The term 10 000/24, for example, would be 
stored in the computer as 0.4167 x 10, which is in error 
by at least 0.000 03 x 10? = 0.03. This error alone is 
already much bigger than the sum we want to calculate. 


(ii) A better method would be to calculate e!? from the series 
15 100 1000 x 
e” =14+104 "AE tr +++, and then calculate its 
reciprocal. This time, all the terms of the series are posi- 
tive, so that there is no cancellation and no induced 
instability in its evaluation. 


7.1.6 Summary of Section 7.1 


In this section we defined the terms 


ill-conditioned problem (page C 7) 
well-conditioned problem (page C 7) 
inherent instability (page C 7) 
magnification factor (page C 7) 
absolute condition number (page C 7) 
relative condition number (page C 8) 
induced instability (page C 10) 
floating-point arithmetic (page C 12) 
relative error (page C 12) 


"Techniques 


l. Use condition numbers to predict ill-conditioning. 

2. Perform calculations in four-figure floating-point arithmetic and analyse 
the errors involved. 

3. Consider various methods for a given problem choosing, if possible, 
the most stable and most economic one (as exemplified in the case 
study). 


Notation 
D (page C 6) 
x (page C 6) 
X (page € 6) 
fi(x) (page C 12) 
"n (page C 12) 
5 (page C 14) 
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7.2 RECURRENCE RELATIONS 
7.3.4 What is a Recurrence Relation? 


The formula we used in the last section, 
lTrh-,—1 (r51,2,...) 


is an example of a recurrence formula or recurrence relation. In general, 
if we have any sequence of numbers, say tio, t, , U2, ..., then any formula 
connecting a general element u, of the sequence with its predecessors in 
the sequence is called a recurrence relation. A particularly simple form of 
recurrence relation is $ 


U, =4,u,-; +b, (r=1, 2,...) 


where a, and b, may depend on r. This is called a /inear, first-order recur- 
rence relation (we shall see why in sub-section 7.2.3). The recurrence 
relation J, + r7, -, = 1 is evidently of this type (since it can be written 
I, = —rlI,_,+ 1); so are the recurrence relations involved in arithmetic 
and geometric series: 


Arithmetic series a, a + h, a+ 2h, ... (u, = u,-1 + h) 
Geometric series a, ax, ax”, ... (u, = xu, i) 


Another example of a recurrence relation is the one defining the Fibonacci 
sequence, which we discussed in Unit 5, Determinants and Eigenvalues; in 
the notation of this section it is 


u, = duo + HL (ug =u, =1;r=2,3,...) 


This is an example of a linear second-order recurrence relation. In general, 
any recurrence formula that can be put in the form u, = g(r, u,-, , w,.2), 
but not u, = f(r, u,..), is said to be of second order. 


The following exercises illustrate some of the many ways that recurrence 
relations can arise in practice. 


Exercises 


7/2 
l. O i(z-| xX cos x dx (r =0, 1,...), 
o 


show (by integrating by parts twice) that J, satisfies the linear 
second-order recurrence formula 


4-C)-e-09- e523.) 


(ii) Given that Jọ = 1, write down the exact values of J, and J4. 
2. In the Foundation Course you met the definition of a derivative 


y) =m EHD, 


We can approximate y'(x) by the formula 


Wx + h) — y(x) 
DE ac for some 


small (but not zero) value of / (if y'(x) exists). 


(i) Write down the recurrence relation obtained by using this approxi- 
mation in the differential equation 


y) = x? + yx) 

and by replacing x by rh. (Use the notation 
u, = y(rh) (r =0, 1, ...), 

so that y'(rh) = (u., — u,)fh.) 
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(i) Given that y(0) = 1, and using A = 0.1, calculate approximations 
to y(0.1) and 5(0.2). 


Because of this type of application, recurrence relations are often called 
difference equations. 


If you have time, you might like to solve the differential equation by the 
method described in Unit 4, Differential Equations I and compare the 
results. 


Solutions 
2/2 
l. (i) zf X' cos x dx (r20,1,2,...) 
o 
2/2 d i 
= [ X (sin x) dx 
2/2 
= [x' sin x]? — f rx ^! sin x dx 
o 
nV 1/2 ai d 
= (5) - [ rx ac cos x) dx 
- (3) + [rx ^! cos x]? 
n2 
— Í r(r — )x' ^? cos x dx 
0 
aV n/2 
= (5) +0-[ r(r — 1)x' ^? cos x dx 
2 o 
aV 
ie = (5) -r(r— 12,4 (r72,3,...) 


(ii) To find J, we set r = 2, obtaining 
Jy = (n2? -2x(2—1)x1 since Jo =1 
= (n/2)? — 2. 
To find J, we set r = 4, obtaining 
J4 = (n/2)* — 4 x 3 x [(n/2) — 2] 
= (n/2)* — 12(n/2)? + 24. 
2. (i) The approximation suggested, applied to 
Y G9) = x? y() 
gives 


Ups — U, 
HL = (ny + u, 
h 


or 
upi = (1 + Adu, + r°. 


This is the required recurrence relation (it gives u,, , in, 
terms of u, rather than u, in terms of u,_, as we have had 
up to now; but this is a difference of notation, not of 


substance.) 
(i) With 4 = 0.1, the recurrence formula becomes 


4.44 = (1.1)u, + 1073 x r? 
To find (0.1), which is x, , we set r = 0, obtaining 
p(0.1) = u, = (1.D)ug = 1.1 


since ug = y(0) = 1 (given). 
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To find (0.2) we set r = 1 and get 
y(0.2) = u, = (1.Du, + 107? x 1? 
= 1.21 + 0.001 
= 1.211. 


The differential equation 
YQ) — ya) = x? 
can be solved by using an integrating factor. 


Thus 


© oep (7 3) = x! exp C7 3) 


y(x) exp (—x) = Jr exp (—x) dx. 


Using integration by parts twice 
y(x) = —x? — 2x — 2 + 3 exp (x) 
since 
x0) = 1. 
Hence 
(0.1) = 1.106, 
and 
y(0.2) = 1.224. 


7.2.2 Linear Recurrence Relations of First Order 


The easiest recurrence relations to treat theoretically are those of the first 
order. In this sub-section we will obtain the theoretically exact solution of 
any first-order linear recurrence relation (by " theoretically," we mean that 
the solution may not be in a form suitable for practical calculations). 


We consider a special case first: the recurrence relation 
u, = au. +b (r=1, 2, ...) 


where a and b are fixed real numbers (i.e. independent of r). Applying this 
in turn with r = 1, r = 2 and so on we can express 4; , U2, U3, etc. in terms 
of ug. The recurrence relation itself does not tell us the value of up; we 
shall treat it as an arbitrary constant, analogous to the arbitrary constant 
in the solution of a first-order differential equation. Denoting this arbitrary 
constant by c and applying the recurrence relation we obtain 

Ho —c 

u, — ug +b =ac +b 

u, — au, +b — a?c + (a + 1)b 

us = au, +b =a@°c + (à? +a + 1)b 
and in general 

u =de + (dT! a7. 4a41)b (r215,2,...) 


Using the formula for the sum of a geometric progression this can be 
written in one of two alternative forms: 


dfa #1, u, =ac + 
a= 


(1) 
ifa =1, u,=ce+rb (r=1, 2, ...) 


We can generalize this calculation to the case where a and b depend on r, 
so that the recurrence relation has the form 


u, = @,u,-, +b, (r =1, 2,...) 


Again taking uo to be an arbitrary constant and calling it c, and then using 
the recurrence relation with r = J, then r = 2, then r = 3, and so on, we 
find 

ug —c 

Uy = duo + b, — aac by 

Uz =U, + b; — aac + a5 b, + b 

U3 = d3U5 + by = a3a5a,€ + azab, + azb: + b3 
and in general 


u, = (a,a,-1 +++ Gp Qy)e + (a,a,-, ...a305)b, + (aa, +++ 0403)55 
++ bab, +b, (r =1, 2, ...). 


This can be written more concisely using the summation sign: 
r " 
u, = (a,a,-, 111 434,)€ +} (aa, 1 Asay)bs Q) 
s-1 


with the convention that for s = r in the summation, the product a, ::: a, ,, 
is 1. This formula is analogous to the formula at the top of page K97 
which gives the general solution of a first-order linear differential equation. 
Notice that it gives u, as the sum of two parts, one proportional to an 
arbitrary constant c and the other independent of c. We shall see later in 
the unit that these two parts correspond to the two parts in the solution of 
a linear problem: the first is in the kernel of a linear transformation associa- 


ted with the recurrence relation, and the second is a particular solution of 
the recurrence relation. 


Example 1 
The general solution of the recurrence relation 
u,—1— ni, (r21,2,..) 


which we used in our case study (sub-section 7.1.5), is obtained by 
setting a, — —r and b,=1 in the general linear recurrence relation 
U, = a,u,- + b,. Thus, using the formula of Equation 2, we obtain 


u, = (== =D) C27 De 
* Eco (r-1)-- C G0) 


=(-Drle Y C- 1X 76 — 1) (4 D, 
s=1 


from which 
Up =C 
—e+1 
uy = 2c +(-1)x24+(-—1®x 1 =2ce-1 
uy = —6¢ + (— 1)? x 3x 2+(-1)! x 34+ (-1)° x 1 = —6c + 4. 


ll 


uy 


Any value of c gives a solution of the recurrence relation. If we want u, 
1 


to be equal to J,, which is e X'&' dx, we need an extra condition to 
0 
determine c. Theoretically, the obvious choice is the known value of J, 


1 

which is el e*dx = 1 — e^! Thus, since c— J), c = 1 — e^!; with this 
0 

value of c, the above solution gives 


=(P — 673) Y (-1Y7r( — 1-7 (41). 
FL! 


Practically, however, as we have seen in the case study, this formula is not 
suitable for the numerical evaluation of J, unless r is quite small, since 
e^! is not known exactly and is multiplied by a large factor. 


Exercises 


1. Write down the general solutions of the following recurrence relations, 
by using the formulas in Equations (1) and (2): 
(Gi) u, = 2u,_, 
Gi) u, —ru-=0 . 
(ili) uy — (r + Iu, =0 

2. For the recurrence relations in the previous question, pick out the 
particular solutions that satisfy ug = 1. 

3. You borrow £S from a building society on 1st January at 1% per 
annum interest and pay back £P every 1st January thereafter. If the 
amount you owe them just after your rth repayment is £u,, find a 


recurrence relation connecting u, with u,_, by filling in the blanks 
below. 


Tust after the rth repayment, you owe 


——— M9 


Just before the (r + 1)th repayment, you owe 


——— d— —À 
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Just after the (r + 1)th repayment, you owe 


Hence u+; = 
Hence find a formula for the amount you owe after the rth repayment, 
in terms of r, S, P and J. 


Solutions 


1. (i) Equation (1) with a — 2, b — 0, gives u, = 2'c. 
(i) Equation (2) with a, = r, b, = 0, gives u, = rc. 
(ii) This is identical with the previous case. All that has 
happened is that we have r -- 1 where r was before; so 
the answer is 


Ua = (r + 0e, 
which is equivalent to 
u,— ric. 


2. Since up = c, the appropriate choices from the general solu- 
tions given above are (i) u, = 2"; (ii) and (iii) u, = r1. 

3. Just after the rth repayment, you owe £u,. Just before the 
(r + 1)th repayment, you owe 


I I 
£u, + 1002 - zu * sx)" 


Just after the (r + 1)th repayment, you owe 


I 
thor (1e ge (r20,1,...) 


T 
This is of the form u, = au, ., + b with a = 1 + 10 b=-P, 


and with r replaced by r + 1. 


The general solution of the recurrence relation is therefore the 
one given in Equation (1), 


zx uie 
(iux) - 


To find the solution appropriate to our case we use the one ré- 
maining piece of information, which is that the amount 
owed initially (i.e. exactly one year before the first repayment) 
is £S. In our notation, this is £v9, and therefore we have 
c — S and so the answer to the problem is 


IW 100P IY 
w= (rug) 5 I (+35) i). 
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7.23 Linear Recurrence Relations as Linear Problems 
In sub-section 7.2.1 we promised to explain why we are entitled to call 
the recurrence relation 

i= au, + b, 


linear. Actually, such an explanation is not entirely straightforward because 
the word “linear” has a very special meaning in this course and the use 
of the word in this context, whilst fully justified, is one step removed from 
our usage. In the course we use it to describe a special type of mapping 
between vector spaces. So where are the vector spaces here? 


The vector space is the same in the domain and the codomain. It is the space 
of all infinite sequences of real numbers: an element of the space has the 
form 


U = (ug, Uy, 5, ...) 


where the u, are real numbers and the list of numbers goes on for ever. 
It is an obvious extension of the concept of the spaces 


R? (pairs of numbers) 
R? (triples of numbers) 


R" (n-tuples of numbers, where n is any fixed positive integer) 
and the operations are very similar. 
(uo, Uys U25...) + (Vo, Vis V25 +++) = (Wos Wis Ws se) 
where w, =u, 4- v,, and 
(uo , His Uz, +++) = (Wos Wis We, ss) 
where w, = au,. A useful symbol for the space is R”. 
The problem of solving a recurrence relation such as 
u, = a uu, + b, 
is equivalent to finding a sequence u, such that 
u, — A, u,- = b,. 


This problem fits the format of our general linear problem, for putting 
U, — a,u,- = wr 1, 2,...), the general term of a sequence w, and 
Wo = 0, we define a mapping 


T:u-—w. 


Thus, in trying to solve the recurrence relation, we are trying to solve the 
equation 
T(u) =b 


R” Re 
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It remains to show that T is a linear transformation, in which case the 
recurrence relation we started with simply specifies a linear problem and so 
it would seem reasonable to describe it as a linear recurrence relation. 
Exercise 


Show that T is a linear transformation. 


Solution 
Tu +y) =w, 
where 
W, =U, + U, — au, c. + Li) 
= at — QU, +U, — @,0,-4 
ie. 
T(u + v) = T(u) + T(v) 
Also 
T(au) = w 
where 
W, = QU, — Q,0U, -1 
= a(u, — a,u,-1) 
ie. 


T(au) = aT(u). 
Thus T is a linear transformation. 


There is a whole class of linear transformations of R® to itself which bear 
a very close relationship to the linear differential operators that we met 
in Unit 4,, Differential Equations I. One obvious similarity is that in both 
cases the vector spaces have infinite dimension. But there is another simil- 
arity. Linear differential operators are built up from a simple constituent 
linear transformation, D, the differentiation operator. The same is true 
in the case of linear transformations from R® to itself which lead to recur- 
rence relations; they are built up from the transformation E defined by 


E: (ug, ty, U2, ...) — 9 (uy, 5, Ug, ...). 


You might call it the “ beheading mapping”: its effect is to shift every 
term in the sequence up one place nearer the head, and to lose the “head”, 
ug. Eis commonly called the shift operator. 


Exercise 


Show that £ is a linear transformation. 


Solution 
E(uo + Vo, Uy 94, ...) = (Uy + Vi U2 + 02...) 
= (Uy, U2,.-.) + (Wis P2,---) 
= Eg, Uy) tg, ++) 
+ E(vo, Vis 025 ---) 
E(uug, AU, Auz, ...) = (uu, au, ...) 
= a(uy, us, ...) 
= gE(uo, Uy, U2, LL.) 
In Unit 4, Differential Equations I we saw how D can be used to build up 


more complicated linear differential operators. Much the same thing can 
be done with £. For example, 


E+2:u-— Eu + 2u, 
E?43E41:ue > E(Eu) + 3£(u) + u, 
(E 1E + 2) : u —— (E + 1)(E + 2)u) 


are all linear difference operators. 


The comparison does not end here—the methods of solution in each case 
are strikingly similar. There is, of course, an obvious similarity arising 
from the fact that they are both linear problems, but also the actual 
technique of finding the kernel is much the same. This is hardly surprizing 
because we know that E has something to do with differentiation; in the 
Foundation Course we introduced differentiation by way of differencing. 
But there is also a similarity between the vector spaces, apart from the 
fact that they are both dimensionally infinite. The domain and codomain 
of D consist of continuous functions. The space A? could also be identified 
with a space of functions having domain Zi; for a sequence u can be 
identified with a function 


firmu, (re Zp). 


The vector space operations in R® then correspond to ordinary addition 
of functions and multiplication of functions by real numbers. 


As with differential equations, we classify recurrence relations by their 
order. The order of a recurrence relation is simply the highest power of E 
in the equation. Thus 


(E + 3)u=0 
is first order, 
(E? + 4E — 3u - 0 


is second order, and so on. When the equation is written in terms of the 
coordinates of u, such as 


u, + 3u,-2, 


the order is simply the difference between the largest and smallest suffices 
—in this example the order is 1. The equation 


U, rusa + 4u,-2 = 3" 


is second order, and so on. 


Solving recurrence relations 


As with differential equations, the first step in solving a recurrence relation 
is to find the kernel. The analogy continues because, as with differential 
equations, the dimension of the kernel is the same as the order of the 
recurrence relation (see Unit 4, sub-section 4.2.2). For example, in the first- 
order case, the kernel will be the solution of a problem such as 

Una — au, = 0 
or 

Una, = dU. 

We wish to show that the solution space of this problem is one-dimensional 
and we can do this by showing that it is isomorphic to the one-dimensional 
space R. Given any ug e R, the sequence u is defined uniquely by 

Uy = Qg ug 

Uz = AyAg Uy 
and so on. The mapping 


fo: U-_ ü 
0 o 


from the solution space to R is one-one. It is easy to complete the proof 
that fọ is an isomorphism (notice that ug can be any element of R), and 
so the kernel of the first-order problem is one-dimensional. The single 
sequence 


(ao; 4180, 424180, ..-) 


is a basis for it. 
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A similar analysis can be adopted for the more general cases. Thus to find 
the kernel for an nth-order recurrence relation, we need to find 7 linearly 
independent solution vectors to form a basis. The way these are found is 
again very similar to the way the problem is tackled in differential equations. 
To illustrate the technique, and because we shall be needing the results 
later in the unit, we shall have a look at second-order, constant-coefficient 
recurrence relations, i.e. relations of the form 


(E? - aE + bju =w 


where a and b are real numbers. The kernel of the linear transformation 
E? + aE + b is the solution set of the homogeneous equation 


(E? + aE + bu =0. 


In the analogous case with differential equations, the kernel had as a basis 
functions of the form x —-—-—» e** (see sub-section 2.2 of Unit 4, Differ- 
ential Equations I). This gives us a clue as to how to proceed in this case if 
we notice an interesting feature of these functions. They satisfy the equation 


Df= f. 


That is to say, the basis of the kernel consists of eigenvectors of D. (Notice 
that D is a linear transformation from an infinite-dimensional space and 
has an infinite number of eigenvalues: A can be any real number. To find 
a basis for the kernel we select just the ones we need.) 


The eigenvectors of E satisfy the equation 
Eu = 2u 


Ura = Au, 


Upya Ay, 
for any ug e R. 


So the sequence (1, 2, 47, ...) is an eigenvector of E for any value of 2. 
If we try such sequences as solutions of the homogeneous equation 


(E? c aE + bu = 0, 
we find that 2 must satisfy the equation 
A +ai+b=0. 
We refer to this as the auxiliary equation as in the other context. 


Thus when this equation has two distinct roots, 4, and 45, the problem is 
solved; for the sequences 


Ay = (1, A,, 42, 23,...) 
de = (1, 22, 25, 28...) 


give two independent vectors which span the kernel. Thus for the problem 
(E? + 3E + 2)u=0. ’ 
for example, we choose À to take the values of the roots of 
4? 4+324+2=0 
and so the solution space is 


«4, C Ds (= 055 (7 05,...) (= 2), (— 25 (= 2), «> 
<u, v», where u, = (— 1)’ and v, = (— 2)’, r =0, 1, 2,.... 
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Having found the kernel, we now need a particular solution of the non- 
homogeneous equation 
(E? c aE + bju =w 


to complete the solution. We shall, however, not go into methods of 
finding the particular solution (see the example and exercises below), nor 
into the procedures to adopt when the 2’s are equal or complex. If you are 
interested, you might like to try to continue the investigation when we 
tackle similar problems later for differential equations. 


Insub-section 7.2.2, we have seen that it is a relatively straightforward matter 
to write down the solution set of first-order linear recurrence relations and 
some second-order ones. But even then we may be left with problems; for, 
whenever we want to calculate numerical values for these answers, i.e. 
calculate a specific u,, we may find the answer swamped by errors. The 
numerical aspect is the main interest of this unit and so we shall turn to 
this problem in the next section. 


Example 
Find 


(i) the general solution of the recurrence relation u, = u,., + 1,2 ; 
(ii) the particular solution with uo =u, = 1 (in which case u, is the rth 
Fibonacci number); 
(iii) the general solution of u, = u,-, + u,-2 + 1. 


G) In this case the linear problem is homogeneous, and the solution is 
u, = CA, + 0345 
where 2, and A, are the roots of 
4 -2-1=0; 
ie. Ay =4(1 — 5) and 4; = 4(1 + /5). 
(i) Ifuo =u, = 1, we have 
Up = Cy +e, =1 


uy = Cy C = A +e C +f) =1, 


2 


and the solution gives the rth Fibonacci number 
PER pi 
ES (( 245 =) an 
J5 2 2 


(This we found by a different method in Unit 5, Determinants and 
Eigenvalues.) 


(iii) A particular solution, p,, can be found by putting p, — constant. 
since the non-homogeneous term in the recurrence relation is a con- 
stant. Putting p, = c, = constant, we find c = C3 +c +1, giving 
c3 = —1 and the general solution 


Uu, = Cy C 5j + C2 C ty i: 


Exercises 

1. Write down the general solutions of 
G) u-u.it2uá. 
(ii) u, + 5u,_, + 6u,_, =0 


2. Solve u,+2 + u,4, = 6u, with ug — 0, u -l. 
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3. Find the general solution of 
Uy, — 10.1u, + u, = — 2r, 
given that there is a particular solution of the form 
U, = C3r + C4. 
Solutions 
1. The auxiliary equations are 


G) 4? — 14-2 =0 with roots 2, 22, 2; = —1, 
(i) A? + 524+ 6 =0 with roots 4, = —2, A, = —3. 


The general solutions are therefore 


@) u =c + ¢(-1), 
Gi) u, = e(72Y + e 7-3. 

2. Using the method of Solution 1 the general solution is 
u, = C2" + ¢2(—3)". The conditions vp = 0, u, = 1 give 


ce, +c, =0 
2c, — 3c, = 1; 
so that c, —1, c; — —1 and the required solution is 


u, = 40 — (-3)y) 
3. The solution of the associated homogeneous problem is 
110" + ¢2(+'5)’, since 10 and 45 are the roots of 


4? —10.14 4-1 —0. 


If cr + c4 is a solution of the non-homogeneous equation, 
substitution gives 


(elr + 1) + c4) — 10.1(esr + c4) + (es(r — 1) + c4) 
= —2.7r, 


which is true for all r if c4 = 4, c, = 0. The required solution 
is then 


u, = ¢,10" + eG + dr. 
7.2.4 Summary of Section 7.2 


In this section we defined the terms 


recurrence relation (page C 22) 

linear recurrence relation (page C 22) 

linear difference operator (page C 29) 

order of a recurrence relation (page C 30) 
Techniques 


Solve first-order linear recurrence relations, and second-order linear recur- 
rence relations with constant coefficients. 


Notation i: 


R? (page C 28) 
E (page C 29) 
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7.3 NUMERICAL ANALYSIS FOR RECURRENCE 
RELATIONS 


7.3.4 First-order Recurrence Relations—Forward Recurrence 


Even though we may be able to write down the general solution of a recur- 
rence relation, there remains the problem of calculating the values of the 
numbers ug, wu, ..., of investigating whether small changes in the data 
make large or small changes in the members of the sequence (the question 
of inherent instability, ill-conditioning), and whether the rounding errors 
of computer arithmetic have serious or tolerable accumulation (the ques- 
tion of induced instability). In this sub-section, we do this for the problem 
of calculating up , 1; ... by “forward recurrence ”, that is, by starting with 
a given ug and then calculating u,, uz, and so on. 


We first examine the question of inherent instability for the general first- 
order recurrence relation 


u, = aub, (1) 


For simplicity we assume that all the numbers a, and 5, are known exactly 
(and can be stored exactly), but that the initial value ug is inaccurate, 
either '' physically" or “ mathematically ". We have seen (sub-section 7.2.2) 
that the general solution of Equation (1), corresponding to the initial 
value ug = c, is 


r 
u, = (a, age + LG -tt Asar)bs 
E 


Notice that, although we did not develop the solution in this way, the 
formula for u, falls into two parts; the first part is an element of the kernel 
of the linear transformation 


T:u-—w, 
where w, = u, — a,u,.., , and the second part is a particular solution of 
Tu) =b. 


For the purposes of calculation, since the as and bs are known exactly, 
we can regard each u, as the image under some function c ——— u,. 
We can test for inherent instability (ill-conditioning) by calculating the 
absolute condition number (see sub-section 7.1.1), 


du, 
dc 


Thus, a small error in the initial value c is magnified or diminished by a 
factor |a,::*a,| . If the magnitudes |a,|, |az|, [a5], ... are all greater 
than 1, this condition number may thus become very large as r increases, 
indicating that the problem is ill-conditioned. 


Usually we are interested more in the relative than the absolute error, and 
the appropriate measure of ill-conditioning in this case is the relative 
condition number 


c du, 
u, dc 


(a, `>- a))c 
(a, aye + Dias (a, 21), 


element of kernel 


(Notice that this is 


element of general solution ) Although this formula is 


more complicated than the last, we can still see the possibility of ill-con- 
ditioning: if |a, --- a,| increases as r increases, while the numbers u, we 
are trying to calculate increase less rapidly, or decrease, then the problem 
will be ill-conditioned for large r. In such a case, no matter how ac- 
curately we perform the subsequent computation, there will be a large 
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error in the computed result, both relatively and absolutely, and the prob- 
lem is inherently unstable. 


We can now explain the phenomena described in sub-section 7.1.5 relating 
to the computation of J, = ep x'e dx from a first-order recurrence 
relation which, to avoid ambiguities of notation, we write as 
u,=1—ru,_,. 
Here we have a, = —r and hence 
absolute condition number — r! 


cr! 


relative condition number — 


U, 


The relative condition number is greater than r!, since with c = ug taken 
as Jo, which is 1 — e~!, the Sequence ug , Ui, 42, ... can be shown to de- 
crease. Thus for the values of r in the neighbourhood of 7, the relative 
condition number is somewhere around 7! — 5040 and the problem is very 
ill-conditioned. This explains the enormous errors obtained when this 
method was used in the case study to calculate I and similar integrals. 


On the other hand, if we had been interested in the same recurrence 
relation but with vo = 1 (instead of 1 — e^), the value of u, would have 
been — 1754. In this case the relative condition number would have been 


0 
s which is about 3, and so the problem of calculating u, by this 


method would have been reasonably well-conditioned. An error of +1074 
in the value of uy produces an absolute error of +0.5 in u, but a relative 
error of only about 1 part in 3000. 


Turning now to the problem of induced instability, it is clear that the 
machine is likely to produce rounding errors in the successive computa- 
tions of u,, u;, ... . For example, the exact formula for u, is 


Uy = aquo + b, 


but the floating-point machine computation will introduce an error ei 
so that the number actually stored by the machine is 


ü, = adio +b, +e, 
where i, is the stored value of u, . 


The next calculation, that of u; , thus starts from the incorrect value ii, in 
place of u,; moreover the floating-point arithmetic introduces a further 
error, say ez, so that the number stored by the machine in place of Uy is 


fiz =a; +b, + ez 


Continuing in the same way we see that the numbers stored by the machine 
satisfy a recurrence relation 


u, =4@,U,-, +b, +e,. 


The general solution of this recurrence relation by the formula of Equation 
(2) of sub-section 7.2.2, is 


r 
ü, = (a, ^ ag)itg + > (a, as+1)(b, + e) 
sz] 
and thus it differs from the exact solution 


r 
u, = (a, aio + Y, (a, as+1)b, 
szi 


35 


LM 7.3.1 


by an amount 


(a, a))&o + Y, (a7 85465 
s-1 


where eg = iig — Uo is the error in the original data. Thus, if the numbers 
Jail, [az]; -.. are greater than 1, and if r is large, then not only the 
original error eg but also the other errors e; €z, etc., produced early on 
in the calculation will be magnified as the calculation proceeds. Thus in 
this problem the conditions which favour ill-conditioning (values of 
lail, |a2|, etc., exceeding 1) tend to favour induced instability as well. 
(In later units on numerical analysis we shall see examples where in- 
duced instability arises even in Well-conditioned problems.) On the 
other hand if the numbers |a,|, |a2|, etc., are less than 1, then both the 
initial error in ug and the later ones ¢,, e2, ... will be diminished as the 
calculation proceeds to high values of r, and so the problem is likely to 
be both well-conditioned and free from induced instability. 


The situation can be summarized in this way: if the solution we are looking 
for does not increase with r as fast as the solution of the homogeneous 
equation (the kernel), then for large r the solution we want will be swamped 
by an unwanted contribution from the kernel, stimulated either by errors 
in the data or by rounding errors from early stages of the calculation. On 
the other hand if the solution we want is increasing with r as fast as, or 
faster than, the unwanted contribution, then it can be calculated with small 
relative error, though perhaps large absolute error, by this method. 


Exercise 


Working to four significant figures, as in four-digit floating-point arith- 
metic, we have calculated y,, y2,..., yg for the recurrence relation 
Jesi =1—(r + 6)y,, taking the initial value yg first as 0.15 and then as 
0.16. The results are shown in the accompanying table. How much of the 
difference in the two values of yg is due to induced instability? 


(Hint Can you calculate how much of it is not due to induced instability 
—that is, how much is due to inherent instability ?) 


— 


yo = 0.15 yo = 0.16 
r » (r+ Oy, » (r+ y, 
0 0.1500 0.9000 0.16 0.96 
1 0.1000 0.7000 0.04 0.28 
2 0.3000 - 2.400 0.72 5.76 
3 —14000:|  —12.60 —4.16 —42.84 
4 13.60 136.0 43.84 438.4 
5| —135.0 —1485 —4314 —4811 
6| 1486 4812 


The difference of 0.01 in yọ leads to a difference of 4812-1486 = 3326 
in yg. 
Solution 


To see how much of this difference arises from induced instability 
we compare it with the difference to be expected in an exact cal- 
culation. The exact solution of the recurrence relation is 


y, 7 (~ U'(r + 5y(r + 4) -++ (D(6)c + terms independent of c 
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A change of 0.01 in c causes a change in the exactly computed 
value of y, amounting to 


11x10x9x8x7x6x0.01-33264 


This is the change in y, due to ill-conditioning alone. The rest of 
the change, which is due to induced instability, is 3326 — 3326.4 — 
—0.4. In fact, because of the simple numbers used, the only 
rounding error in the whole calculation is in the last value of yg for 
yo — 0.16. 


7.3.2 First-order Recurrence Relations— 
Backward Recurrence 


1 
In our case study of the evaluation of J, = e^! | x'e” dx from the recurrence 


o 
relation 7, =1-—rl,-,, we found that instead of starting from 7, and 
working away from r — 0, it was much better to use the same recurrence 
relation but work towards r — 0. In other words we now regard the recur- 
rence relation as giving J,_, in terms of 7, , writing it in the form 


1 
f-1= (1-2). 


Since the recurrence is now going the opposite way, it is reasonable to 
expect the errors to be diminished rather than magnified. 


To start off the calculation with this form of the recurrence relation, we 
used the fact that J, approaches zero for large r, and took as our starting 
point the approximation J,=0 for some sufficiently large value of n 
(about 14). Then we calculated successively J,-, J,-2, and so on, and 
found that these numbers were surprizingly insensitive to the chosen values 
of n and J,, indicating that this problem is very well-conditioned. 


To remind you of the very good conditioning of this formulation of the 
problem of calculating 7,, we give some more numbers calculated from 
the backward recurrence relation, this time calculated working in floating- 
point arithmetic with six decimal digits. 


14 0 -1 
13 0 0.071 428 6 0.142 857 
12 0.076 923 1 0.071 428 5 0.065 934 1 
11 0.076 923 1 0.077 380 9 0.077 838 8 
10 0.083 916 1 0.083 874 5 0.083 832 8 
9 0.091 608 4 0.091 612 6 0.091 616 7 
0.100 932 0.100 932 0.100 931 
0.112 384 0.112 384 0.112 384 
0.126 802 
0.145 533 
0.170 893 
0.207 277 
0.264 241 
0.367 880 
0.632 120 


CQ — t Uv & Ca Os - co 


The computed uy agrees with the “ required” value 1 — e^! toall six figures. 


The results confirm our expectations that the problem is well-conditioned, 
and that the errors decrease very rapidly as r decreases. By taking the 
three different starting values, and comparing the results, we see that the 
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three values of each of ug , u5, ..., ug are essentially equal, to six decimal 
figures, and though consistency is not always a good check, it would be 
very surprizing if it failed here! But in fact, as we shall see in a moment, 
we can perform a complete error analysis and specify in advance the value 
of n for which any particular u, (with r < n), will have whatever precision 
we require. 


To do the error analysis we shall consider the more general linear recur- 
rence relation 


Up =a, u,- +b, 
Provided all the a, are non-zero, we can solve for u,_, obtaining 


u, — b, E 
uy = 


a, 


To obtain the general solution starting with a particular value of u, (with 
n large) we apply the recurrence relation with r = n, then r =n — 1, then 
r =n — 2, and so on. This gives 


us — b, 


yi = ua, — balan 


In 


Un-1 — bn- 
Un-2 = a = Upl (Gq Gy 1) — ballan Gp —1) — b, ia, 
In-1 


and, in general 


U, = Unf (s.a) — Bal Gn 0, 4 7 G41) 


= bailan- tt ap) 1 = Beasley 
= tal (n° ** ar41) — 2 Plas 4,41) 


This time we are using u, to determine u,, so that the absolute condition 
number is 


1 


Qn" ** s, 


and the relative condition number is 


Up 


177 Apu, 


If all the numbers |a,|, |a;], ... are greater than 1, then the absolute 
condition number will be small. Also, if the correct value of |u,| is less 
than |u,| for r « n, the relative condition number is even smaller. Thus, 
under these conditions, the problem is very well-conditioned. 


These conclusions apply to the recurrence relation for the integral 


1 ; 
ef xe dx. 


0 


The recurrence relation Z, = 1 — rZ,., tells us that 4, — —r and so the 
absolute condition number is 


1 


That is, any error in the initial value assumed for J, is multiplied by this 
factor to give the numerical value of the error in the calculated value of 
1,. Suppose for example, that we want Lio correct to 4 decimal places, 
that we start with u, — 0 and (for the moment) that we make no round- 
ing errors in the calculation. (It then follows, incidentally, that we also get 
Ty, Ig, Iņ, etc. correct to 4 decimals.) By assuming v, — 0 rather than 
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its correct value 7,, we make an error of magnitude 7, ; and this leads to 
an error of magnitude 


I, 


leol la D--121 


in 7,9. Now J, can hardly exceed 0.5, since Ig =1 — 7! =0.632--- and J, 
decreases as r increases. The requirement that J, be correct to 4 decimal 


places means that we want |e,9| <4 1074. Thus we want to choose n 
so that 


0.5 eh y 19-4 0.5 
nx(n—-l-12x1l 2 ~ 10 x 10 x 10 x 10 


By trying a few values for n if necessary, we can find that n = 14 (or any ` 
larger integer) satisfies this condition (for there are then 4 factors in the 
denominator on the left, all greater than 10, and 4 factors of exactly 10 on 
the right). So by using the approximation J,, — 0 we can guarantee (at 
least with exact arithmetic) that our values of Lio» I5, 1g, etc., are correct 
to 4 decimal places. 


Will the rounding errors of floating-point arithmetic produce any signifi- 
cant induced instability? As in the error analysis for forward recurrence, 
we can allow for these errors by replacing the recurrence relation by 


u, —b, 


r 


U,- = +e. 


Solving this by the same method as before, gives 


Uy = u0, a) Y (Blas aei) ess aei) 


srl 


= (exact solution starting from u) — $, e,/(as-1*** 8,41) 
srl 


Thus the error in u, due to induced instability is 


enata uq | 
i Gey Anny? 05. 

If all the |a|'s are greater than 1, the terms die away rapidly as we go along 
the series, and so most of the total effect of rounding error is little greater 
than the first term and this represents the rounding error introduced by 
the very last arithmetic operation in the calculation. Thus the conditions 
(|a|'s greater than 1) which make this problem so well-conditioned also 
make it free from induced instability. For example, in our case study, we 
are working to 4 significant figures with values of u, that are less than 1; 
thus the errors e, 4, 6,42, ... in the above formula do not exceed 4 x 1074 
and the entire formula (with r = 10) tells us that the error in 1,9 is at most 


E 1 I 1 
ix whit +T e)ess 107* (ie) 


1] 12x1l 2 10 100 
1 T 
-2.x10*x 1 
2 T5 


This formula tells us that the error in u,o due to the rounding is very little 
greater than the rounding error in the last step; i.e. in calculating 


uio  — uy 


The main lesson to be learned from this error analysis is this: for a first- 
order recurrence relation, if the solution we are looking for grows less 
fast with increasing r than the solution of the homogeneous equation (the 
kernel), then it cannot be calculated by forward recurrence but it can be 
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calculated by backward recurrence; for as we decrease r, the unwanted 
contribution from the kernel decreases more rapidly than the solution we 
want. 


Exercise 


7/2 
The integral 7, = f x?" cos x dx satisfies the recurrence relation 
0 


(3 2r 
1-() -2Qr- DL, (r=1,2,3,...) 


with the initial condition J) = 1. . 


(i) Would you recommend forward or backward recurrence for calculat- 
ing I, using 4-figure floating-point arithmetic? Justify your choice. 

(ii) Show theoretically that the approximation 7; = 0 is sufficiently accu- 
rate to give four-figure accuracy in the above calculation of J,. (You may 
assume that 


{2 14 q 1 qM 60. 
naf x x= (5) < 60.) 


Solution 


(i) Backward recurrence is the recommended method. The solu- 
tion of the homogeneous problem associated with the given 
recurrence relation 


k, = —2rQr — 1)k,_}, 


is k, = (— 1)"(2r)! c, which increases rapidly with r. To avoid 
unwanted contributions from elements of the kernel, we 
should recur in the direction in which this function decreases 
—i.e., backwards, in the direction of decreasing r. 

(i) Since the absolute condition number is 


the absolute condition number in calculating J, form I, is 


1 1 1 -6 
aaa, (I4 x I2 x 1110 x9) ^27 10 ^ 
Since a, — — 2r(2r — 1) in our recurrence relation. The error 


15 
in replacing the true value of J, by 0 is at most + 6) < 60 
and the consequent error in J, is at most 
4 x 107° x 60 = 30 x 1076 < 4 x 1074 


which is acceptable since only 4 places of decimals are 
wanted in 74. 
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7.3.3 Second-order Recurrence Relations of Initial-value 
ype 


We have seen that a second-order recurrence relation has a general solu- 
tion which is the sum of a particular solution and the general solution of 
the associated homogeneous equation (the kernel). For example, we 
Observed in part (iii) of the example of sub-section 7.2.3, that the equation 


Uy =u,- + Uy +1 


has the general solution 


a «( = P al! a, 


1 45V 1-J/5Y f 
where Em and ( 2 are independent solutions of the associ- 


ated homogeneous equation u, = Up-1 + yg. 


Such a solution has two arbitrary constants c, and c; , which will be deter- 
mined by the imposition of two more pieces of information about u,. The 
most common are 


(i) the specification of uy and u, (or of u, and u,..,) from which uz ,us, 
+++ (OT Uno, 9,5, ...) can be obtained uniquely by forward (back- 
ward) recurrence from the given recurrence relation; or 

(ii) the specification of uy and u,, from which we hope to determine the 
intermediate values u,, u;, ..., u, ,. 


Problem (i) is called an initial-value problem, and problem (ii) a boundary- 
value problem. Here we shall consider briefly some aspects of the initial- 
value problem, because they are quite similar to those we have discussed 
for the first-order recurrence relation problem. 


From what we have learnt in the first-order case we can now easily see that 
the initial-value problem will be ill-conditioned whenever the required 
solution is less dominant than any of the elements of the kernel. Moreover 
in these circumstances induced instability will be manifest and will have 
the same nature as the inherent instability, just as in the first-order case. 


Consider, for example, the homogeneous second-order relation 
Ure, — 10.1u, + u,- —0, 
whose general solution is 
u, = ¢,10" + 041077" 
T 
10 
we have theoretically suppressed the kernel* element 10’, and should in 


theory obtain the solution u, = 107" as a result of direct calculation. Some 
results obtained with four-figure floating-point arithmetic are 


With the specification ug = z, u, = —, we find thatc, = 0 and c; = z sothat 


r 0 1 2 3 4 5 


u, 3.142 | 0.3142 | 0.0310 0.001 100 0.04211 0.4242 


r 


which not only “go wrong" very quickly but soon exhibit the tendency 
for successive values to exhibit growth by factors of 10, corresponding to 
the kernel element c,10". 


Can we recover the situation by reformulation using backward recurrence, 
based on the knowledge that our required solution tends to zero as r gets 
larger? Since it is a second-order problem we need to specify both u, and 


* Note that {10’, 107") forms a basis for the kernel. 
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14,.,, and we take u„ = 0 which is a reasonable approximation for large n, 
and give u,-., some non-zero value. The latter is probably not a good 
approximation, but if ij, is the computed solution of this problem then ki, 
is also a solution (since the equation is homogeneous), and we can calculate 
a good value of K by making kiig equal to the specified value of uo. If k 
is small, then ki,_, is also a reasonable approximation to the true value 
of u,-,, and then the computed kī, should be close to the required solution 
u, for r — 0, 1, 2, ... with small errors as r tends to n. 


To introduce some rounding errors immediately we take v, =0,u,-, =e, 
and with 1 = 5 find results for the “trial” solution iz. 


ris 4 3 2 1 0 


T, | 0 | 0.3679 | 3.716 | 37.16 | 371.6 | 3716 


.142 y " 
Now multiplying this by (=) = 0.000 845 5, in order to satisfy our 


given initial condition ug = z, we obtain the very good values 


4 3 2 1 0 


u, | O | 0.000 311 1 | 0.003 142 | 0.031 42 | 0.3142 | 3.142 


There are only small absolute errors everywhere, even at r — 5, and only 
small relative errors almost everywhere, even as far away from the origin 
asr = 4 =n —1 in this case. 


We see that in practice the homogeneous second-order problem is very 
similar to the non-homogeneous first-order problem, and this is because 
in each case the solution falls into two distinct parts: in the first-order 
problem the parts are the element from the kernel and the particular solu- 
tion, in«he homogeneous second-order case the parts are the two inde- 
pendent solutions which form a basis for the kernel. We should perhaps 
remark that the analysis of a good starting point r =n is here more in- 
volved, but the use of the “ consistency ” argument which we demonstrated 
on page 38 is in practice quite satisfactory. 


In the homogeneous case our backward recurrence succeeded because the 
required solution, one of two possibilities, dominated in this direction. 
The non-homogeneous second-order problem, with three parts to its solu- 
tion (the two independent solutions for the kernel plus the particular solu- 
tion), produces a new situation in that the solution we want, one of three 
possibilities, may not dominate with recurrence in either direction. 


This situation exists, for example, with the problem. 
ue, — 10.14, + u,- = —2.7r, 

with the initial conditions ug = 0, u, = 4. 

The general solution (see Exercise 3 of sub-section 7.2.3) is: 
u, =} r + c,10 4 c,1077, 


and the result of applying the initial conditions gives c, = c, — 0. But if 
we now begin at uy and make a direct numerical calculation, we know that 
using forward recurrence the term 10" will take charge. 
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Forward recurrence 


r 0 I 2 3 4 5 6 


Calc. u, | 0 | 0.3333 | 0.6660 | 0.9940 | 1.274 | 1.076 | —3.904 


True u, | 0 | 0.3333 | 0.6667 | 1.0000 | 1.333 | 1.667 2.000 


Further, using backward recurrence with the theoretically exact initial- 
value reformulation 


Ue, — 10.1u, + uu, = —2.7r, 


we know that the term 107" will dominate. This is illustrated by the fol- 
owing catastrophic numerical results. 


Backward recurrence 


r 9 8 7 6 5 4 3 


Calc. u, | 3| 2.667 | 2.340 | 2.063 | 2.300 | 7.667 | 63.97 


Trueu, | 3| 2.667 | 2.333 | 2.000 1.667 1.333 1.000 


Have we any other possible and this time successful reformulation? It 
turns out that whenever the initial-value problems in which either ug and 
u, Or u, and u,.., are specified are both ill-conditioned, then the boundary- 
value problem with uo and u, specified is likely to be well-conditioned. 


It follows that if ug and wu, are specified, then if we know some u, from some 
independent consideration, the best formulation of the problem is to 
ignore u, and solve the boundary-value problem with uo and u, given. In 
our case, for example, with ug = 0, ug = 2, we can write down the recur- 
rence relations for r = 1, 2, ..., 5 as a set of linear simultaneous algebraic 
equations, in which the known up and ug are transferred to the right-hand 
side. The equations are 


r=] —10.1u, +u, = —2.7 
r=2 u,—101u4u4,—- —54 
r=3 u,—10.lu;+u,= —81 
r=4 u,;—10.lu,+us = —10.8 
pu ua — 10.1ug = —15.5 


In Unit 8, Numerical Solution of Simultaneous Algebraic Equations, we 
shall discuss methods for solving linear equations which are guaranteed 
to give good results whenever the problem is well-conditioned (that is, the 
methods have hardly any induced instability), and using such a method, 
with four-figure floating-point arithmetic, we actually produce values of 
u, Which are the correctly rounded versions of the exact solution. 


You might ask what one can do if some u, is not known, but that we do 
know that u, tends to 0 as r increases. What we can do here is analogous 
to previous similar situations. We solve the relevant algebraic equations 
several times, taking wu, — 0 with different values of », and assessing the 
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accuracy of our results by inspection. Taking, for example, ug — 0 and then 
ug = 0, we produce the results: 


f 0 1 2 3 4 5 


(n = 8)u, | O | 0.3334 | 0.6667 | 1.000 | 1.333 | 1.664 


(n — 9)u, | 0 | 0.3334 | 0.6667 | 1.000 | 1.334 | 1.667 


r 6 7 8 9 


(n =8)u, | 1.974 | 2067 | 0 


(n =9)u, | 1.997 | 2.304 | 2.367 | 0 


They show how well-conditioned our boundary-value problem is, and also 
generate considerable confidence in the computed results at least up 
tor=5. 


We have now effectively finished our introduction to the treatment of 
second-order recurrence relations of initial-value. The only missing item 
isa full analysis, corresponding to that of sub-section 7.3.1 for the first-order 
case, for the maximum error in the computed u, obtained with floating- 
point arithmetic for an initial-value problem in which the specified uy and 
u, might have small physical or mathematical uncertainties and in which 
rounding errors are induced in the subsequent arithmetic. This analysis 
is not part of this course, but you will find it on pages 40-42 of Computing 
Methods for Scientists and Engineers by Fox and Mayers (see Bibliography). 


Exercises 
For each of the following problems state: 


(i) whether or not the problem is ill-conditioned; 

(i) how, in each ill-conditioned case, you might successfuliy reformulate 
it. 

l. 2u,4,, — 5u, + 2u,., 20; uo = 0, uy = m. 

2. 2u,,, — Su, + 2u,-, 20; ug = m, u, = $r. 

3. 2u4,, — Su, + 2u,- = — dr, ug =0, u4 = 1. 

4. 2u4; —5u,-2u,., = — dr, uo =0, u, =}. 


(Hint: In 3 and 4 a particülar solution is u, — ir.) 


Solutions 


1. The general solution is u, = ¢,2" + c2(4)', so that the particular 
solution we want, satisfying the given initial conditions, is 
u, —ín(2" — (4)’). This contains a large multiple of the domi- 
nant element of the kernel, so that the problem is relatively 
well-conditioned. 

2. The required solution is u, = n(4)’, but there is a more domi- 
nant kernel element 2". The problem is therefore ill-conditioned. 
Backward recurrence, with u, =0, u,- = 1, multiplied by a 
factor to give the correct us = 7, is a well-conditioned reformu- 
lation because the unwanted element is here decreasing and 
the required solution is increasing. 


The general solution is u, — ir + c,2' -- c; (1), and with 
ug = 0, u, = 1 we find the solution 


u,—ir$0 — 0». 


This contains a significant contribution from the dominating 
element of the kernel, and the problem is relatively well- 
conditioned. 

The required solution is u, = ir. The problem is badly condi- 
tioned, and so is any initial-value reformulation (e.g. back- 
ward recurrence with ug = 2, us =4). The reformulation asa 
boundary-value problem, as in the last part of sub-section 
7.3.4 gives a well-conditioned problem, solvable without 
induced instability. 


7.3.4 Summary of Section 7.3 


Tn this section we defined the terms 


forward recurrence (page C 34) 

backward recurrence (page C 37) 

initial-value problem (page C 41) 

boundary-value problem (page C 41) 
Techniques 


l. For a given linear first-order or homogeneous second-order recur- 
rence relation, analyse the inherent and induced instabilities. Where 
relevant avoid such instabilities by problem reformulation (e.g. recur 
in the reverse direction). 

2. For a given initial-value second-order linear recurrence relation, 
determine whether it is ill- or well-conditioned. In ill-conditioned cases, 
reformulate the problem (where possible) as a boundary-value problem. 


Notation 


ii, 


(page C 35) 
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7.4 SUMMARY OF THE UNIT 


In this unit we have discussed the following items. 


I. 


The care needed in the formulation and solution of problems when 
numerical answers are required. 


2. The difference between physical problems, in which some of the data 
are uncertain, and mathematical problems in which the data are exact 
but cannot be stored exactly in a computer. 

3. The possibilities of inherent instability, so that few worthwhile figures 
can be quoted in the answer to a physical problem, and that if many 
figures are required in a mathematical problem (which is a meaningful 
request) much work might be involved. 

4. The possibilities of induced instability caused by rapid accumulation 
of rounding errors in computer arithmetic. ` 

5. The way the machine stores numbers and performs the four basic 
arithmetic operations in floating-point arithmetic. 

6. The nature of linear recurrence relations, how they might appear in 
practice, and how to find some solutions for (i) general first-order 
equations and (ii) second-order equations with constant coefficients. 

7. A full error analysis for the numerical solution of recurrence relations 
of first-order, revealing both inherent and induced instability. The 
possibility of avoiding both instabilities by problem reformulation; in 
particular, by recurring in the reverse direction. Similar treatment, 
without full error analysis, of homogeneous second-order recurrence 
relations with given initial conditions. 

8. The impossibility of avoiding ill-conditioning for some solutions of 
certain non-homogeneous initial-value second-order equations and the 
importance of reformulation, where possible, as a boundary-value 
problem. 

Definitions 

The terms defined in this unit and page references to their definitions are 

given below. 

ill-conditioned problem (page C 7) 
well-conditioned problem (page C 7) 
inherent instability (page C 7) 
magnification factor (page C 7) 
absolute condition number (page C 7) 
relative condition number (page C 8) 
induced instability (page C 10) 
floating-point arithmetic (page C 12) 
relative error (page C 12) 
recurrence relation (page C 22) 
linear recurrence relation (page C 22) 
linear difference operator (page C 29) 
order of a recurrence relation (page C 30) 
forward recurrence, (page C 34) 
backward recurrence (page C 37) 
initial-value problem (page C 41) 
boundary-value problem (page C 41) 
Techniques 


1. 
2. 


3. 
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Use condition numbers to predict ill-conditioning. 

Perform calculations in four-figure floating-point arithmetic and 
analyse the errors involved. 

Consider various methods for a given problem, choosing, if possible, 
the most stable and most economic one (as exemplified in the case 
study). 


t+ + te OF 


* 


* 
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4. Solve first-order linear recurrence relations and second-order linear s+ * 
recurrence relations with constant coefficients. 
5. Fora given linear first-order or homogeneous second-order recurrence tra 


relation, analyse the inherent and induced instabilities. Where relevant 
avoid such instabilities by problem reformulation (e.g. recur in the 
reverse direction). 

6. For a given initial-value second-order linear recurrence relation, ttx 
determine whether it is ill- or well-conditioned. In ill-conditioned cases, 
reformulate the problem (where possible) as a boundary-value problem. 


Notation 
E (page C 6) 
x (page C 6) 
X (page C 6) 
fi(x) (page C 12) 
Pe (page C 12) 
r (page C 14) 
R? (page C 28) 
E (page C 29) 
ti, (page C 35) 
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7.5 SELF-ASSESSMENT 


Self-assessment Test 


This Self-assessment Test is designed to help you test quickly your under- 
standing of the unit. It can also be used, together with the summary of the 
unit for revision. The answers to these questions will be found on the next 


non-facing page. We suggest you complete the whole test before looking at 
the answers. 


1. 


For which values of x are the following problems ill-conditioned ? 


(Use as your criterion of ill-conditioning a relative condition-number 
exceeding 1.) 


(i) Calculation of x?. 
Gi) Calculation of x'/?. 
(iii) Calculation of cos x. 


Explain in about 60 words the difference between induced and inherent 

instability. 

(i) A number in the range [0, 100] is fed into a computer which uses 
four-digit floating-point arithmetic. What is the maximum 
absolute error in the stored value (the inherent “ mathematical” 
error)? 

(i) The stored values of two such numbers are X and y. What are the 
additional (induced) maximum absolute errors in the computa- 
tion of (a) X + y, (b) x — y? 


1 

The integrals J, = í x?'e7* dx, for r — 0, 1..., satisfy the recurrence 
o 

relation 


Describe, but do not carry out, a method of computing the value of I; 
to four figures which has reasonable accuracy using four-figure floating- 
point arithmetic. 

(Hint I, decreases as r increases, and Jo ~ 0.7.) 


Suppose that we have calculated 7, (of Question 4) by some indepen- 
dent method, such as numerical integration using Simpson's rule, to an 
accuracy (largest possible error) of 4 x 1074. If we now calculate J, 
by forward recurrence from the recurrence relation of Question 4, 
what is the resulting accuracy in the computed J,, assuming that all 
the arithmetic in the computation from the recurrence relation is per- 
formed exactly (that is, there is only an inherent error in J) and no 
subsequent induced error)? 


Find the general solutions of the recurrence relations 

() u,+1 — 3u, + 2u,-, =0 

Gi) u, = up, — 0.09u, -2 + 0.09 

We want to compute the solution y = u, of the recurrence relation * 
3u,4, — Tu, -2u,-., = 1 — 2r, for r =0,1,..., 20, 

and we have the following possible formulations of the problem. 


(a) Specification of uy and 4. 
(b) Specification of uo and us. 
(c) Specification of ug and 20. 


Which formulation would you choose, and why? 
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Solutions to Self-assessment Test 


1. For the calculation of f(x), the relative condition number is defined as 


k = |F O, 
and if this exceeds 1 the problem is relatively ill-conditioned. 
() f(x) 9 x^, f(x) = 2x, so that k =2 and the problem is relatively ill- 
conditioned for any value of x. 
(ii) S) =x, f'(x) =4x7'7, so that k=4 and the problem is 
relatively well-conditioned for any value of x. 


(ii) f(x) = cos x, f(x) = — sin x, so that k = |x tan x| and the problem 
is relatively ill-conditioned for those values of x for which 
|x tan x| » 1. » 


2. Induced instability means that the method of solution is unsatisfactory, 
in that any rounding errors in the arithmetic accumulate to give a poor 
result even in a well-conditioned problem. 


Inherent instability, or ill-conditioning, means that small changes in the 
data of the problem cause large changes in the solution. This is indepen- 
dent of the method of solution. 


3. (i The number is stored as 10° x a, where a is a four-figure number 
O.xxxx with first digit non-zero. So the maximum storage error 
is 10° x 0.5 x 107^. Since the number is smaller than 100, b is at 
most 2, so that the maximum absolute storage error is 0.5 x 107? 
= 0.005. 

(ii) Since in floating-point arithmetic the sum or difference is formed 
to "double length" and then rounded to single length, the 
error is again just 10° x 0.5 x 107^, where the result of the 
operation is 10* x a. 

(a) In the addition the result may exceed 100, b might be 3, and 
the maximum error is then 0.5 x 107! — 0.05. 

(b) In subtraction, both numbers being positive, the result is 
smaller than 100, so that b < 2 and the maximum rounding 
error is 0.005. 


4. Since J, decreases as r increases, and since the kernel element, 
$x3x-x (r— +), increases as r increases, we expect to be able to com- 
pute Jo, starting with any J, for large n, and recurring backwards with 


Lie 
ped r—4 * 


Rounding errors do not accumulate, and no special arithmetic precautions 
are necessary, so that a four-figure approximation to 4e~! should suffice 
for four-figure accuracy everywhere. 


Apart from rounding errors, the error in the computed J is the multiple 
j : 
(n—3(n-3):-Q) 


times the error in 7,. Since Jọ = 0.7 and J, decreases with r, the error in I, 
can hardly exceed 0.7. For the error from this source to be less than 
5 x 107? (for four-figure accuracy) we therefore choose n so that 


0.7 x: 
b-De-pog rm 


The major rounding error is in the last step, with r = 1, and is therefore 
twice the error in J, + the error in the stored value of et. 


contributing at most 13 units in the fourth figure of I. 
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5. 


If we know J, with a possible error of £, then the possible error in J,, 
from this source alone (that is with no further mistakes in the arithmetic) is 


3 
xox cee x ll, 10395 


2 2° * 328 


x 107* < 0.01. 


NI— 


So there might be an error in the second decimal place of 7, . 


6. 


(i) 


(ii) 


The general solution of 
Ura) — 3u, + 2u,_, =0 


is cipi + c2p3, where p, and p, are the roots of the quadratic 
equation p? — 3p + 2 = 0. That is, p, = 2, p; = 1, and the general 
solution is 


u, = 42" ro, 


where c, and c; are constants. 
The general solution of 


u, =u,- — 0.09u,_ 2 + 0.09 


consists of a particular solution plus the solutions in the kernel. 


The latter are cp; + c;p5, where P, and p, are the roots of the 
quadratic equation 


P^ — p + 0.09 — 0, that is, p, = 0.9, p, — 0.1. 


For a particular solution, examine whether u, = C3, a constant, 
is a solution. Substitution gives 


€3 — c3 + 0.09c, = 0.09, 
„so that c3 = 1 is a solution, and the general solution is therefore 
u, = ¢,(0.9)" + c,(0.1)" + 1. 


7. The solutions in the kernel are CP + C2p5, where p, and p, are the 
roots of the quadratic equation 3p? — 7p + 2 = 0, that is p, 22, p; — 1. 
If a particular solution is r, the general solution is 


u, = C42" cy) +7. 


(a) If ug and u, are specified, we must use forward recurrence, and the 
unavoidable introduction of the term c,2" (through rounding errors) 
will swamp the less rapidly increasing required solution. 

If u29 and wy are specified, we must use backward recurrence, and the 
other solution c;(1)' will now dominate. 

If ug and uzo are specified, we can solve the algebraic equation ob- 
tained from the recurrence relations with r — 1, 2, ..., 19, and we 
have a known stable method for their solution. 


(b) 
(c) 


Formulation (c) is therefore preferred. 
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